115 research outputs found

    Using Pareto Fronts to Evaluate Polyp Detection Algorithms for CT Colonography

    Get PDF
    We evaluate and improve an existing curvature-based region growing algorithm for colonic polyp detection for our CT colonography (CTC) computer-aided detection (CAD) system by using Pareto fronts. The performance of a polyp detection algorithm involves two conflicting objectives, minimizing both false negative (FN) and false positive (FP) detection rates. This problem does not produce a single optimal solution but a set of solutions known as a Pareto front. Any solution in a Pareto front can only outperform other solutions in one of the two competing objectives. Using evolutionary algorithms to find the Pareto fronts for multi-objective optimization problems has been common practice for years. However, they are rarely investigated in any CTC CAD system because the computation cost is inherently expensive. To circumvent this problem, we have developed a parallel program implemented on a Linux cluster environment. A data set of 56 CTC colon surfaces with 87 proven positive detections of polyps sized 4 to 60 mm is used to evaluate an existing one-step, and derive a new two-step region growing algorithm. We use a popular algorithm, the Strength Pareto Evolutionary Algorithm (SPEA2), to find the Pareto fronts. The performance differences are evaluated using a statistical approach. The new algorithm outperforms the old one in 81.6% of the sampled Pareto fronts from 20 simulations. When operated at a suitable sensitivity level such as 90.8% (79/87) or 88.5% (77/87), the FP rate is decreased by 24.4% or 45.8% respectively

    Improvement of computerized mass detection on mammograms: Fusion of twoâ view information

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/135080/1/mp6098.pd

    Is this model reliable for everyone? Testing for strong calibration

    Full text link
    In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup. Such models are reliable across heterogeneous populations and satisfy strong notions of algorithmic fairness. However, the task of auditing a model for strong calibration is well-known to be difficult -- particularly for machine learning (ML) algorithms -- due to the sheer number of potential subgroups. As such, common practice is to only assess calibration with respect to a few predefined subgroups. Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal or where the poorly calibrated subgroup is small, as they either overly subdivide the data or fail to divide the data at all. We introduce a new testing procedure based on the following insight: if we can reorder observations by their expected residuals, there should be a change in the association between the predicted and observed residuals along this sequence if a poorly calibrated subgroup exists. This lets us reframe the problem of calibration testing into one of changepoint detection, for which powerful methods already exist. We begin with introducing a sample-splitting procedure where a portion of the data is used to train a suite of candidate models for predicting the residual, and the remaining data are used to perform a score-based cumulative sum (CUSUM) test. To further improve power, we then extend this adaptive CUSUM test to incorporate cross-validation, while maintaining Type I error control under minimal assumptions. Compared to existing methods, the proposed procedure consistently achieved higher power in simulation studies and more than doubled the power when auditing a mortality risk prediction model

    Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach

    Get PDF
    The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study

    Validating Pareto Optimal Operation Parameters of Polyp Detection Algorithms for CT Colonography

    Get PDF
    We evaluated a Pareto front-based multi-objective evolutionary algorithm for optimizing our CT colonography (CTC) computer-aided detection (CAD) system. The system identifies colonic polyps based on curvature and volumetric based features, where a set of thresholds for these features was optimized by the evolutionary algorithm. We utilized a two-fold cross-validation (CV) method to test if the optimized thresholds can be generalized to new data sets. We performed the CV method on 133 patients; each patient had a prone and a supine scan. There were 103 colonoscopically confirmed polyps resulting in 188 positive detections in CTC reading from either the prone or the supine scan or both. In the two-fold CV, we randomly divided the 133 patients into two cohorts. Each cohort was used to obtain the Pareto front by a multi-objective genetic algorithm, where a set of optimized thresholds was applied on the test cohort to get test results. This process was repeated twice so that each cohort was used in the training and testing process once. We averaged the two training Pareto fronts as our final training Pareto front and averaged the test results from the two runs in the CV as our final test results. Our experiments demonstrated that the averaged testing results were close to the mean Pareto front determined from the training process. We conclude that the Pareto front-based algorithm appears to be generalizable to new test data
    • …
    corecore